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Abstract 


The Student Evaluation of Teaching (SET), although controversial, is a common practice at the higher 
education level for faculty appraisals and promotions, but seldom at secondary school level. Concerns 
have been raised as to whether students are informed and experienced enough to evaluate teachers’ 
teaching practices in a reliable way and arrive at valid outcomes. The purpose of this research was to 
explore the reliability of students’ evaluations of mathematics teaching at secondary school level. This 
research involved eight teachers, and 194 Grade I1 students from eight secondary schools in Bojanala 
District, North West province in South Africa. A SET questionnaire was developed, validated and used 
for data collection. The data were analysed by calculating the average deviation index of the students’ 
evaluations of each teacher per item and the Intraclass Correlation Coefficient (ICC) with SPSS. This 
was done using one-way random effects, absolute agreement and a multiple raters/measurements model. 
Both the ADI and ICC values showed a high degree of reliability of the SET. Hence, SET at secondary 
school level may provide a reliable indication of teachers’ educational practices that might be used 
for the formative assessment of teachers’ instruction. It can also assist in designing teacher training 
programmes for pre-service teachers and professional development programmes for in-service teachers. 
Keywords: Average Deviation Index, Intraclass Correlation Coefficient, reliability of SET, secondary 
school, student evaluation of teaching (SET). 


Introduction 


Ensuring that students are offered quality education is a priority for every government 
and institution. One way of achieving this priority is by evaluating teachers’ educational 
practices. The evaluation of teachers’ educational practices could be achieved through Student 
Evaluation of Teaching (SET). SET is often used at the higher education level to appraise 
lecturers’ instruction practices and for faculty appraisals and promotions (Mandouit, 2018; 
Zabaleta, 2007). According to Penny and Coe (2004), the use of SET “as an indicator of teaching 
quality is now a common feature in universities around the world” (p. 215) and the results from 
SETs have been used to make critical judgements in higher education (Beran & Rokosh, 2009). 
In the United States, for example, SET is used as a major source of teaching evaluation by the 
majority (94.2%) of fourth-year liberal arts colleges (Miller & Seldin, 2014). 

The evaluation of the education system in general faces the challenge of assessing the 
system in a valid and reliable manner (Taut & Rakoczy, 2016). In particular, the validity and 
reliability of SET is a contentious matter (Hattie, 2009; Hornstein, 2017). Some researchers 
believe that the way in which students perceive effective teaching may be unrelated to good 
teaching (Ko, Sammons, & Bakkum, 2013). For example, Beecham (2009) observes that SET 
can become a measure of “customer [student] satisfaction” instead of a measure of educational 
quality (p. 135). 
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Even though some researchers have raised concerns over the biases that may affect the 
results of SET, other researchers who support SET adjudge it as the most acceptable criterion 
for measuring the effectiveness of teaching. As noted by Theall and Franklin (2001), students 
are the most qualified source to rate the extent to which teaching is productive, informative, 
satisfying or worthwhile. According to Theall and Franklin, studies have shown that students’ 
assessments of the amount learned in a subject and their overall evaluations of teachers are 
consistently highly correlated. Felder and Brent (2004) opined that SET should be an essential 
component of teaching evaluation because “students are in a better position than anyone else 
to judge certain aspects of teaching, such as how clear, interesting, respectful, and fair a course 
instructor is, and they’re the only ones who can say how an instructor has influenced their 
attitude towards the course subject, their motivation to learn it, and their self-confidence” (p. 
200). Similarly, Prebble et al. (2004) argued that SET is among the most reliable and accessible 
indicators of teachers’ teaching effectiveness. This is due to the fact that SET might give a more 
holistic rating of the teachers’ educational practices than other methods of teaching evaluation, 
like peer evaluation or teachers’ self-evaluation. According to Quaglia and Corso (2014), SET 
can provide relevant information relating to teachers’ instructional practices because students 
spend more time with the teacher in the classroom. Thus, their evaluations are more likely to 
adequately give an indication of the teacher’s educational practices. 

Many studies at higher education level have provided empirical evidence of the reliability 
and validity of SET (Arreola, 2007; Benton & Ryalls, 2016). However, the use of SET at 
secondary school level is rare (MET project, 2010) and studies on SET at secondary school 
level are sparse. The rare use of SET at secondary school level and the paucity of research 
of SET at that level of education may be due to concerns that students at school level might 
not be informed and experienced enough to give reliable and valid evaluations of teachers’ 
instructional practices and the educational systems supporting them. Nevertheless, some 
studies have shown that secondary school students, and even primary school pupils, can give 
reliable evaluations of teachers’ educational practices. For example, Peterson, Wahlquist, and 
Bone (2000) conducted 9,765 student surveys at elementary, middle and high school levels and 
found that SET was a reliable measure of teacher educational practices. Similarly, Kyriakides 
(2005) has found that surveying primary school students’ is a reliable measure of their teachers’ 
educational practices in Cyprus. In international research conducted in six European countries, 
Kyriakides et al. (2014) found that primary school students’ evaluation of teaching was reliable 
and valid. Other research (Irving, 2004; Wilkerson, Manatt, Rogers, & Maughan, 2000) has 
shown that secondary school students are capable of giving reliable and valid evaluation of 
teachers’ educational practices. 

In South Africa, SET has not been practiced at secondary school level and hence has not 
been the focus of any research. However, the evaluation of the quality of educational practices 
provided by teachers was part of the broader objectives of the Integrated Quality Management 
System (IQMS) of public schools in South Africa (Education Labour Relations Council, 2003). 
However, as observed by Mji (2011) and Mpungose (2014), the system has not achieved its 
intended purpose and has been the subject of much criticism regarding the objectivity of the 
evaluation of the quality of teachers’ educational practices. In the IQMS system, teachers’ self- 
evaluations and peer evaluations are used to appraise the teachers’ educational practices. 

SET results can be used as an input for the government and other stakeholders to create 
policies regarding teachers’ promotion and retention. SET might give insight into teachers’ 
educational practices and the quality of education that teachers provide to students. In addition, 
SET may be convenient in many educational settings and is also cost-effective. However, for 
SET to be used at secondary school level, its reliability and validity will have to be scrutinised 
in order for it to be appropriately supported and carried out. There is a paucity of literature on 
the use of SET at secondary school level (Peterson, Wahlquist, & Bone, 2000). Hence, to fill 
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the research gap, this research explored the reliability of SET at secondary school level using 
an average deviation measure and an intra-rater reliability (Intraclass Correlation Coefficient 


[ICC]). 
Background 


Reliability refers to the consistency, stability, and generalisability of measurements or 
assessments (Hobson & Talbot, 2001). The reliability of a procedure or measure is the degree 
to which the procedure or measure yields consistent results (Benton & Li, 2017). Reliability 
describes how far a particular procedure, such as SET, will produce similar results in different 
circumstances, assuming nothing else has changed (Roberts, Priest, & Traynor, 2006). According 
to Bruton, Conway and Holgate (2000), reliability reflects not just the degree of correlation, but 
also an agreement between measurements. A procedure is said to be reliable when its results are 
not dependent on chance or unknown circumstances. The reliability of a procedure is usually 
tested by having many people perform the procedure on identical data and comparing their 
results. Through statistical measure, if the variation in the results exceeds a certain threshold, 
the procedure is judged to be unreliable (Vriezekolk, 2014). 

The problem of this research was to explore the reliability of SET at secondary school 
level. The reliability of SET may be evaluated by considering the consistency of the ratings of 
the students in a class. If there is little variability in the students’ overall ratings of the teacher, it 
implies that the students tend to perceive the teacher’s instruction in the same way and therefore 
their evaluations may be considered reliable. However, if the students vary substantially in their 
overall ratings of the teacher, it means that their evaluations are not reliable and thus less helpful 
in giving a general impression of the teacher’s teaching (Benton & Li, 2017). 

The consistency of SET can be measured by computing the average deviation index 
(ADI) (Burke, Finkelstein, & Dusig, 1999), the within-group interrater reliability coefficient 
(James, Damaree, & Wolf, 1984), and the Intraclass Correlation Coefficient (ICC) (Koo & Li, 
2016) of students’ ratings of teaching, among other measures. 


Methodology of Research 
Research Design 


This research used quantitative descriptive research method and survey research design 
to explore the reliability of SET. According to Maree and Pietersen (2016), quantitative research 
is a systematic and objective process of using numerical data from only a selected sub-group of 
a population to generalise the findings to the population. Surveys are used to gather large scale 
data that can be statistically manipulated in order to make generalisations (Creswell, 2015). 
In this research, a questionnaire was used to gather grade 11 students’ evaluations of their 
mathematics teachers’ educational practices across eight classes and the data were statistically 
analysed to make inferences regarding the SET in question. This research is part of an ongoing 
research project, which commenced in 2010 with the development of instruments. However, 
the data for this research were collected in September 2017. 


Participants 
The research was conducted in secondary schools in a North West Province, South 
Africa. Eight Grade 11 mathematics classes from different secondary schools were involved in 


this research. The province and schools were conveniently selected to participate in the research 
because of their proximity to the researcher. Also, the mathematics teachers in the schools 
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consented to their classes being used for the research. There were 194 students (ages 16 — 19 
years old) from the mathematics classes in the eight secondary schools who participated in the 
research. The distribution of the participants per teacher (per class) is shown in Table 1. 


Table 1. Participants per teacher. 





Teacher Number of participants 
T1 20 

T2 39 

13 20 

T4 26 

T5 19 

T6 19 

17 17 

18 34 

Total 194 





Ethical Considerations 


As part of the ethical procedure of the University of South Africa, permission to conduct 
the research was obtained from the university’s ethics committee and from the provincial 
Department of Basic Education in the province where the research was conducted. Permission 
was also obtained from the principals and teachers whose students participated in the research. 
In addition, before the commencement of the research, the participants were informed in writing 
about the purpose of the research, that their participation in the research was voluntary and that 
they could withdraw from the research at will without prejudice. They were also told that the 
information they provided would be treated as confidential and the report of the research would 
keep them and their schools anonymous. They signed consent forms before participating in the 
research. 


Instrument and Procedures 


The data for this research were collected by means of a SET questionnaire. The 
questionnaire was a six-point Likert type rating scale. It consisted of 27 items that represent 
positive descriptors of teacher behaviour in which the students specified their levels of agreement 
or disagreement with the item statements on the scale: 6 = strongly agree, 5 = agree, 4 = slightly 
agree, 3 = slightly disagree, 2 = disagree, and 1 = strongly disagree. The questionnaire was 
developed by the researcher. 

To develop the instrument, the researcher started with a literature search relating to 
the characteristics of effective teachers and teaching. Specifically, the researcher searched 
for studies that surveyed students’ and teachers’ views of effective teachers, such as that of 
Irving (2004). The researcher then perused official documents such as the South African Norms 
and Standards for Educators documents (DoE, 2000), the National Curriculum Statement for 
mathematics (DoE, 2003) and any other related literature. Finally, the researcher interviewed a 
non-random sample of Grade 8 to Grade 12 mathematics students and teachers. These teachers 
and students were asked to indicate their views of effective teachers. From those sources, a pool 
of 186 items was created. For the 186 items, the researcher started a vetting process involving 
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teachers, students and university-based mathematics and science education researchers. The 
vetting process was carried out to ensure that the items were clear, had no ambiguity in meaning 
and that they would be easily understood by the students as recommended by Mogari (2004). 
Furthermore, the researcher did not want an extremely long instrument that could end up 
being unwieldy and taking too long to complete. Initially, the 186 items were scrutinised by 
six teachers (four mathematics teachers and two English language teachers) and 10 secondary 
school students. They recommended some grammatical changes and the removal of some 
items. For instance, it was suggested that the researchers use ‘learners’ instead of ‘students’ as 
this is standard practice in the high school system in South Africa. Following their suggestions, 
the researcher trimmed down the items to 135. 

The researcher then requested four university-based mathematics and science education 
researchers to examine the 135 items. They felt that a number of items were repetitive and 
that the instrument was too long, so some of the items were eliminated, resulting in an 84- 
item instrument. The four researchers were then asked to rate each of the items in relation to 
assessing teacher educational practices using a 5 to | rating scale where 5 = strongly favourable; 
4 = favourable; 3 = undecided; 2 = unfavourable and 1= strongly unfavourable. 

From their rating of the items, the correlation coefficients between the average rating 
for each item and the total (summed) score across all items in each subscale (Trochim, 2006) 
were computed, and all of the items that had correlation coefficients that were less than .7 (r 
<.7) were eliminated. This process resulted in 39 items, which were further reduced to 30 
items by eliminating the 9 items with the least correlation coefficients among the 39 items. 
This was done in order to reduce the number of items to 30 so that it would not be too long 
for the participants to complete because potential respondents may have been less inclined to 
participate in a long survey (Galesic & Bosnjak, 2009). 

A factor analysis was used to further determine if the items in the instrument measured 
the theorised constructs and thus strengthened the validity of the instrument. A principal 
component factor analysis on SPSS was used to determine the factor loadings of the items of 
the instrument. The first step was to carry out a preliminary analysis using the output of the 
R-matrix. The result revealed that three items had one-tailed significant values greater than .05. 
Hence, it was judged better to eliminate the three items to avoid singularity (Field, 2005). The 
final instrument was a 27-item instrument. According to Kaiser’s criterion (Field, 2005), five 
factors with eigenvalues greater than one were extracted from the 27 items. The five factors 
were: teachers’ subject knowledge, lesson preparation, lesson presentation, student assessment, 
and student motivation. 

The reliability of the instrument was established by calculating the Cronbach’s alpha 
value (Cohen, Manion, & Morrison, 2011), using data gathered in a pilot study of the instruments 
from 109 students in four secondary schools. A coefficient alpha value of .95 was obtained. 
Based on the rule of thumb, this alpha value was deemed “excellent” (George & Mallery 2003, 
p. 231). Hence, the questionnaire was judged to be reliable. 

Data were collected from the students involved in the research at the schools’ premises 
between one and five days after the topic used for the research had been taught. Trigonometry 
was used as the topic because it is a very important aspect of the school mathematics curriculum 
and was perceived as one of the topics that students find challenging (Chauke, 2013; De Villiers 
& Jugmohan, 2012). 


Data Analysis 
The data were analysed by computing the average deviation index (ADJ) of the students’ 


evaluations of each teacher per item and the Intraclass Correlation Coefficient (ICC) of all the 
evaluations in SPSS. The ADI is the average absolute deviation from a mean or median. It is a 
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measure of interrater agreement for evaluators’ ratings of a single target on a single occasion 
(Burke, Finkelstein, & Dusig, 1999). The students’ evaluations of the teachers in each of the 
schools were used for the ADI analysis. 

ICC is a measure of the reliability of measurements or ratings. It is the ratio of variance 
among subjects (subject variability) over the total variance (Koo & Li, 2016). An ICC is a 
widely used reliability index in inter-rater reliability analyses (Koo & Li, 2016), and ranges 
from zero (no reliability) to one (perfect reliability). The ICC was computed using a one-way- 
random effects, absolute agreement, multiple measurements model at a 95% confidence interval. 
According to Koo and Li (2016), the one-way random effects, absolute agreement, multiple 
measurements model is used when each subject is rated by a different set of randomly chosen 
evaluators. According to Kirkwood and Sterne (2003), the reliability is considered excellent if 
the ICC > 0.75, it is considered fair to good if 0.4 < ICC < 0.75, and poor if ICC < 0.4. 


Research Results 


The data analyses results were categorised and presented in two parts: ADI of the 
students’ evaluations of each teacher per item and the average measure ICC. 


Mean and Average Deviation Index 


The mean (M) and the Average Deviation (AD) index of the students’ evaluations of 
each teacher per item are presented in Table 2. The table shows that the average deviation of the 
students’ evaluations of the teachers on almost all of the 27 items of the questionnaire was less 
than or equal to one. In addition, the mean of the average deviation values for the evaluations of 
each teacher across all of the items was less than one. According to Burke’s (2002) guidelines 
for establishing ranges, the upper-limit cut-off for AD indices for 6-point Likert-type items 
is one. Therefore, the results showed a near perfect interrater agreement among the students 
in their evaluations of each of the teacher’s teaching. This finding implies that the students’ 
evaluations may be judged to be reliable. 
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Table 2a. Means and average deviation index of the students’ evaluations per 
item of the questionnaire (T1 — T4) 
















































































T1(n=20) 72 (n=39) 13 T4 
My mathematics teacher ... M ADI M ADI M ADI M_ ADI 
1 Introduced trigonometric functions in a way that captured 54 07 53 05 54 05 45 09 
learners’ attention. 
9 Gave definitions of terms/vocabularies that appeared to be 49 07 44 11 50 14 49 04 
unfamiliar to learners. 
3 Gave satisfactory answers to learners’ questions. 48 09 53 06 57 05 47 08 
4 Made lessons relevant and meaningful for learners. 64 05 52 08 55 06 51 05 
5 Simplified the subject matter for learners. 49 07 52 07 57 04 45 08 
6 Showed sound knowledge of the subject matter. 56 05 49 09 55 06 45 09 
7 Showed learners interesting and useful ways of solving 55 05 53 07 57 05 49 O7 
problems. 
8 ene lessons by connecting the content to previous les- 50 10 53 10 57 04 46 10 
9 Ended lessons by connecting the content to futurelessons. 45 08 51 1.1 53 06 42 08 
10 Presented sections of the topic in a logical sequence. 49 08 51 O07 53 07 49 05 
11. Related content to real life examples. 46 08 46 12 53 06 35 09 
12 Was always well-prepared for class. §8 03 58 03 59 01 57 04 
13 | Summarised the main points by the end of lesson. 38 13 49 09 49 08 36 06 
14 Was always in class with all of the necessary materials for 54 07 53 07 56 08 51 08 
teaching the topic. 
15 Related ideas to learners’ prior knowledge. 47 07 50 10 56 05 45 09 
16 Supported lessons with useful class work. 57 04 54 07 57 04 54 06 
17 Made use of different teaching techniques. 49 11 53 08 54 09 42 08 
18 Motivated learners to pay attention to the lesson. 59 02 56 05 57 04 50 09 
19 Helped learners where they didn’t understand. 53 09 58 03 58 04 45 09 
20 — Encouraged learners to learn. 58 03 57 04 58 04 52 07 
21 Gave individual support to learners when needed. 47 12 54 07 58 06 42 08 
99 Adjusted the lessons when learners experienced difficulties 45 10 53 07 56 08 39 09 
in learning. 
23 Used assessment results to provide extrahelptolearners. 48 07 54 O07 58 08 44 11 
mi Explained concepts in different ways to help learners 53 07 57 04 57 09 43 10 
understand. 
25 Took extra steps to help all learners learn and achieve suc- 50 08 55 06 55 08 45 10 
cess in maths. 
26 Supported lessons with useful classroom discussions 44 10 52 08 56 07 32 12 
27 ~~ Communicated the topic clearly 55 06 5 10 56 07 48 O07 
Mean 51 07 53 07 55 06 45 08 
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104] Table 2b. Means and average deviation index of students’ evaluations per item 
of the questionnaire (T5 — T8) 








TS T6 T7 T8 





My mathematics teacher ... M ADI M ADI M~ ADI M_ ADI 
Introduced trigonometric functions in a way that captured 

















1 : : 49 11 53 04 51 06 45 07 
learners’ attention. 

9 Gave definitions of terms/vocabularies that appeared to be 54 06 52 03 48 07 40 09 
unfamiliar to learners. 

3 Gave satisfactory answers to learners’ questions. 56 06 53 04 51 06 44 = 1. 

4 Made lessons relevant and meaningful for learners. 53 07 53 04 51 06 44 09 

5 — Simplified the subject matter for learners. 53 07 53 04 44 11 45 12 

6 Showed sound knowledge of the subject matter. 54 05 53 04 46 10 40 114 





Showed learners interesting and useful ways of solving 54 06 51 02 52 08 49 06 


u problems. 
Started lessons by connecting the content to previous les- 















































8 Sans. 53 06 52 03 46 09 41 12 
9 Ended lessons by connecting the content to future lessons. 52 09 50 03 43 #13 38 1.1 
10 Presented sections of the topic in a logical sequence. §.0 08 50 02 47 09 43 12 
11 Related content to real life examples. 57 05 49 03 46 14 52 09 
12 Was always well-prepared for class. 54 07 53 04 51 06 53 09 
13 Summarised the main points by the end of the lesson. 5.2 07 50 02 47 #10 414 12 
44 Sais = with all of the necessary materials to 54 06 49 03 49 06 43 14 
15 Related ideas to learners’ prior knowledge. 5.0 06 50 01 47 10 41 07 
16 Supported lessons with useful class work. §1 10 52 03 52 08 52 O07 
17 Made use of different teaching techniques. 54 07 49 02 49 08 45 09 
18 Motivated learners to pay attention to the lesson. 58 02 50 01 51 O07 51 09 
19 Helped learners where they didn’t understand. 56 05 52 03 52 06 51 07 
20 Encouraged learners to learn. 58 03 50 02 51 08 52 09 
21 Gave individual support to learners when needed. 5.7 05 52 03 53 08 50 08 
9 Adjusted the lessons when learners experienced difficulties 52 09 52 03 50 06 45 114 


in learning. 





23 Used assessment results to provide extrahelp tolearners.§ 5.7 O05 51 01 52 O07 43 09 





Explained concepts in different ways to help learners under- 











24 tend 56 05 51 03 55 06 48 06 

25 Took extra steps to help all learners learn and achieve suc- 56 06 50 02 55 05 49 08 
cess in maths. 

26 Supported lessons with useful classroom discussions. §3 12 50 01 48 09 40 13 

27 Communicated the topic clearly. 56 05 51 01 49 08 46 1.0 





Mean 54 07 51 02 50 08 46 09 
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The result of the average measure ICC using a one-way random effects, absolute 
agreement, and multiple raters/measurements model at a 95% confidence interval is shown in 
Table 3. 


Table 3. Intraclass Correlation Coefficient of the students’ evaluations of the 8 





teachers. 
95% Confidence Interval F Test with True Value 0 
Intraclass 
Correlation LowerBound — Upper Bound Value df1 df2 p 
Average Measures 865 689 .961 7.396 9 70 <.001 





One-way random effects model where people effects are random. 





The average measures ICC is an index for the reliability of different raters averaged 
together. A high degree of agreement was found among the students’ evaluations; the average 
measured ICC was .865 at a 95% confidence interval (F(9,70) = 7.396, p<.001). This ICC 
value is excellent (Kirkwood & Sterne, 2003), which implies both a high degree of correlation 
and agreement among the students’ evaluations of the teachers’ teachings. Hence, the students’ 
evaluations of the teachers were regarded as reliable. 


Discussion 


This research explored the reliability of students’ evaluation of teaching at secondary 
school level in South Africa using the average deviations and Intraclass Correlation Coefficient 
as measures. The results showed that the average deviation of the students’ evaluations of the 
teachers on each of the 27 items of the questionnaire was less than or equal to one, and that the 
average deviation values were less than one for each teacher across all of the items. In addition, 
the average Intraclass Correlation Coefficient value of .865 at a 95% confidence interval was 
obtained. The results showed that there was consistency among the students’ evaluations of the 
teachers’ instructions on each item of the instrument and in general using the average deviation 
index (ADJ) and the Intraclass Correlation Coefficient (ICC). Overall, the results indicate that 
the students’ evaluations of the teachers’ teaching were reliable. This is consistent with the 
findings of Peterson, Wahlquist, and Bone (2000), who have shown that secondary school 
students could give reliable evaluations of their teachers’ educational practices. The finding of 
this study is also in agreement with the result of Irving (2004), who finds that secondary school 
students’ evaluation of teachers was reliable in a study conducted in 13 states in the USA. 

This research showed that the students seemed to have a parallel and uniform view of 
what makes teaching meaningful. All of the students seemed to share the same experiences. 
Hence, the perceptions and appreciation of the teachers’ educational practices was evidenced 
by the near perfect interrater agreement (low ADI) in the students’ evaluations of the teachers 
in all items of the questionnaire. For example, on item 4: “my mathematics teacher made 
lessons relevant and meaningful to learners”, the means of the students’ evaluations of the 
teachers ranged from 4.4 for Teacher 8 (T8) to 5.5 for Teacher 3 (T3), indicating a significant 
difference in the students’ perceptions of T8 and T3 with respect to making lessons relevant and 
meaningful for learners. However, the low average deviation values (.6 for T3 and .9 for T8) 
indicate that the students’ perception of each teacher was consistent, and they were capable of 
discerning the differences between the teachers. Similarly, the mean of the student evaluations 
of the teachers across all the items of the questionnaire ranged from 4.5 for Teacher 4 (T4) to 5.5 
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for Teacher 3 (T3), indicating a significant difference in the students’ perceptions of the overall 
educational practices of T4 and T3. In addition, the mean of the average deviation values across 
all of the items (.6 for T3 and .8 for T4) indicate that the students’ perception of each teacher 
was consistent. These findings are in line with the findings of the MET Project (2010) that 
secondary school students’ perceptions of a given teacher’s classroom practices are consistent 
across the different groups of students that they teach. 

The result of this research corroborates the results of Irving (2004), who has found that 
SET at secondary school level is reliable when based on (a) The teachers’ commitment to 
students and their learning, (b) Mathematical pedagogy, (c) Engagement with the curriculum, 
and (d) Relating mathematics to the real world. The results of this research further concur 
with those of Kyriakides et al. (2014), who have found that SET, which is based on eight 
factors relating to teacher behaviour: orientation, structuring, questioning, teaching modelling, 
application, management of time, teacher’s role in making the classroom a learning environment, 
and assessment, is reliable at primary school level. 

All in all, the SET in this research can be said to be an honest reflection of the students’ 
perceptions of the teachers’ educational practices. This research has contributed to the body 
of knowledge on the reliability of SET at secondary school level, especially in the context of 
South Africa. This finding brings to bear important issues relating to the professional work of 
teachers. It gives an indication that the students who participated in this research seemed to have 
a shared view regarding the nuances of teachers’ professional practice and what is expected of 
a teacher. The nuances relate to the teachers’ knowledge of the subject, lesson preparation and 
presentation, student assessment, and student motivation, as elicited by the SET questionnaire. 


Conclusions 


This research explored the reliability of the SET at secondary school level using the 
average deviation measure and intra-rater reliability of data collected from 194 students in eight 
mathematics classes. The results show that there was consistency and almost perfect agreement 
among the students’ evaluations of the teachers in all of the items of the instrument. Hence, the 
research has revealed that secondary school students are able to provide reliable evaluation of 
their teachers’ educational behaviour. 

The result of this research lends support to the use of SET as a valuable tool to appraise 
teachers’ educational practices at secondary school level. The findings suggest that, at secondary 
school level, SET might be a reliable tool for the evaluation of the quality of education that 
teachers provide to students. It can also be used to provide feedback to teachers on their 
professional practices. SET results may be used by teachers to understand their students’ 
expectations and perceptions of their instructional practices, and consequently to improve on 
their educational practices. Hence, SET results can serve as an effective formative assessment 
tool for teacher professional development in order to improve student learning. 

Furthermore, SET results can be used by school administrators and the government 
to complement the self and peer evaluations of the Integrated Quality Management System 
(IQMS). SET can further be used to identify teachers’ strengths and weaknesses regarding their 
educational practices based on which policies on teacher training, professional development 
and promotion criteria can be made. 

As noted in the introduction, SET has not previously been used to evaluate teachers’ 
educational practices in South Africa and in many other countries, leading to a dearth of 
literature on SET at school level. This research therefore makes a contribution to the literature 
on the use of SET at secondary school level. 

Finally, it should be noted that this research was carried out with mathematics students 
regarding their teachers and was based on trigonometry as a topic; thus, the results may be 
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different in other school subjects or topics and with students in different grade levels. Hence, the 
result should be interpreted as such. In addition, there is a need for the research to be replicated 
using other school subjects and topics, and with students in different grades. 
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